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ABSTRACT 

An analysis of the mission of the National Space Science Data Center (NSSDC) is 
made. The requirements that follow are designed to meet the mission objectives of the 
NSSDC. These requirements are not necessarily exhaustive, the intention being to provide 
a base for comment and discussion. Based upon these requirements, conclusions are made 
about the hardware, software, and design studies to be conducted. 
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REQUIREMENTS FOR THE NATIONAL SPACE SCIENCE 


DATA CENTER INFORMATION SYSTEM 


INTRODUCTION 

The requirements for the National Space Science Data Center (NSSDC) Informa- 
tion System stem from an understanding of its mission. This mission can be best under- 
stood by analyzing the following quotations; 

Nobel Prize winner Professor Richard Feynman has stated that “the most 
dramatic moments in the development of physics are those in which great 
syntheses take place, where phenomena which previously had appeared to 
be different are suddenly discovered to be but different aspects of the 
same thing. The history of physics is the history of such syntheses, and 
the basis of the success of physical science is mainly that we are able to 
synthesize.” (Ref. 1.) 

More recently, Dr. James Vette, the Director of NSSDC, has noted that “the 
processes [in environmental sciences] being observed are not completely understood, and 
they interact with each other in a vast complexity of ways. Consequently, the results vary 
with time and location. Large volumes of data are necessary to obtain patterns, interactions, 
and relations. As new ideas develop in understanding the phenomena, scientists will want to 
analyze further much of the existing data.” (Ref. 2.) Dr. Vette goes on to say, “It is neces- 
sary to study the analyzed and/or reduced data from many experiments in order to obtain a 
fairly comprehensive description of the space environment. This translation or synthesis into 
useful data summaries, compilations, or environments is a natural professional activity for 
those space scientists associated with the Data Center.” 

The primary mission of NSSDC, then, is to provide the means for the dissemination 
and analysis of space science data beyond that provided by the original experimenter. As 
such, the NSSDC is responsible for the active collection, organization, storage, announce- 
ment, retrieval, dissemination, and exchange of space science data. 

A quote from a report of the President’s Science Advisory Committee amplifies 
further the mission of NSSDC; 

“The activities of the most successful centers are an intrinsic part of science 
and technology. The centers not only disseminate and retrieve information; 
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they create new information. . .The process of sifting through large masses of 
data often leads to new generalizations. . .In short, knowledgeable scientific 
interpreters who can collect relevant data, review a field, and distill informa- 
tion in a manner that goes to the heart of a technical situation are more help 
to the over-burdened specialist than is a mere pile of relevant documents. 

Such knowledgeable scientific middlemen who themselves contribute to 
science are the backbone of the information (analysis) center; they make an 
information center a technical institute rather than a technical library. The 
essence of a good technical information center is that it is operated by highly 
competent working scientists and engineers— people who see in the operation 
of the center an opportunity to advance and deepen their own personal con- 
tact with their science and technology.” (Ref. 3.) 

The great 19th century physicist J. C. Maxwell was involved in two great syntheses 
during his lifetime. He combined the laws of electricity and magnetism with the laws of the 
behavior of light. He also had a great deal to do with the synthesis of the phenomena of heat 
and mechanics. If NSSDC can contribute, however small that contribution may be, to a 
great synthesii of the 20th century, then NSSDC will have succeeded in its mission. A flex- 
ible, dynamic, and versatile information system is the key to its success. 

REQUIREMENTS FOR THE NSSDC INFORMATION SYSTEM 

The requirements that follow are designed to meet the mission objectives of the 
NSSDC. These are not necessarily distinct nor is the list meant to be exhaustive. The in- 
tention is to provide a base for comment and discussion. In this manner, new requ ; rements 
can be generated and a more effective information system produced. 

It Must Be Able to Handle Large Varieties and Amounts of Data 

The information system must not only be able to handle space science data, but 
other data as well. These other data may pertain to information about the spacecraft, experi- 
ments, funding, requests, technical references, schedules, etc. This is more fully described in 
the discussion on the requirements for management and other support information. 

The space science data that is received at the NSSDC is basically either reduced data or 
analyzed data. The differentiation between raw, reduced, and analyzed data are important 
to an understanding of the Data Center concepts. The definition of these types of data is 
given in Ref. 2, and their evolution in the space science satellite data flow is explained in 
Ref. 4. An indication of the size of the NSSDC data base and the diversity of storage media 
can be obtained from Figure I. 

It has been estimated that within the next few years, NSSDC will receive annually 
about 10,000 magnetic tapes, 150,000 linear feet of roll charts, and 2,000 100-ft. rolls of 
microfilm. It is probable that more than a trillion bits of space science data will be received 
annually. In addition, NSSDC can expect to receive a large quantity of extra-terrestrial 
photographs within the next few years. These estimates are shown in Figure 2. 
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VOLUME OF DATA AT NSSDC 


MEDIUM 

AUGUST 1967 

SEPTEMBER 1968 

SHEETS AND BOUND VOLUMES. SHEETS 

175.000 

215,000 

DIGITAL MAGNETIC TAPES, V?" * 2400' 

291 

2,000 

MICROFILM. 100-FT ROLLS 

7.800 

10,621 

PHOTOGRAPHIC FILMS: 



9'4" WIDTH. LINEAR FEET 

14,000 

18,000 

70-mm WIDTH, LINEAR FEET 

33.200 

143,500 

35-mm WIDTH. LINEAR FEET 

0 

81,000 

4*5 INCH, EACH 

2,100 

2,400 

8*10 INCH, EACH 

0 

400 

16*20 INCH. EACH 

93 

93 

20*24 INCH, EACH 

2,200 

5,100 

PHOTOGRAPHIC PRINTS: 



9Vi" WIDTH, LINEAR FEET 

0 

9,000 

70-mm WIDTH. LINEAR FEET 

0 

6,000 

8*10 INCH 

600 

2,500 

11*14 INCH 

200 

500 

16*20 INCH 

93 

93 

20*24 INCH 

2,200 

3,700 


Figure 1 -Growth of the Oats Base at NSSDC. 

The machine-sensible (magnetic tapes, disc packs, etc.) space science data will have 
been produced by various computers using a variety of operating systems. The data will also 
contain various formats and structures and will represent many disciplines such as astronomy, 
fields and particles, planetary atmospheres, etc. 

In order to fulfill its mission, NSSDC must process, store, and transform this diverse 
data. It must verify all input to the system, detect errors or omissions, and perform quality 
control. It must provide for an efficient exchange of data from a large and versatile group 
of experimenters to a diverse user community. 


It Must Prepare These Data for a Multitude of Different Uses 

The user requires that the space science data be in a form that is acceptable to him. 
He may want to have a magnetic tape in a certain format to be processed by a specified 
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COMPLETED AND PLANNED SPACE PHOTOGRAPHIC MISSIONS 


Project 

Quantity of Photographs* 

Ranger 

10,000 

Surveyor (individual frames) 

100,000 

Surveyor (mosaic photographs) 

1,200 

Lunar Orbiter 


Mariner 4 

200 

Mariner ‘69 \ 

1,000 

Mariner ‘71 

10,000 

Mariner ‘73 

N/A 

Apollo 8 

1,200 

Subsequent Apollo Flights 


Orbiting Astronomical Observatory 

HIBHHHi 


•Quantity of photographs for proposed missions are gross estimates. 
••Includes photographs of returned lunar samples. 


Figure 2-Flight Projects Yielding Extra-terrestrial Photographs. 

computer using a specified operating system. In order to solve this problem, NSSDC is at- 
tempting to establish an interchangeability-compatibility subsystem with the following 
characteristics (Ref. Z). 

1. Hardware independence. The format should not be oriented to any computer or 
storage medium. Insofar as possible, it should represent the information content 
without taking advantage of special techniques, word formats, or hardware 
features. 

2. Densely packed. Since there is a high processing and maintenance overhead with 
the recorded data, all recorded bits should convey information. Bits should not 
be retained if they serve only to fill out a word for a given computer. This feature 
becomes more important as one considers the transition to other, more expensive 
forms of mass storage. 

3. Efficiency and ease of use. The data should be easy to process with a minimal 
amount of programmer housekeeping. Ideally, the reformatted tapes should 
require no more processing than the original tapes on the generating computer. 

4. Suitability for archival purposes. The tapes should be self-documenting, labeled, 
and designed so that bad areas may be reconstructed or easily bypassed. 


5. Easy transformation to other formats. Provision should be made to facilitate the 
conversion to a format directly processible by an arbitrary computer. 









Instead of requiring all the scientific data, the user may want only a subset of this 
information printed in a prescribed format. He may even request that preprocessing be done 
to the data involving perhaps the use of mathematical and other subroutines. With large data 
banks and rapid access, the user may want to experiment with the scientific data in real time 
(on-line). He may want the assistance of graphics, plotting routines, correlation routines, 
lig^it pens, etc. It is even possible for the scientist to relay some of his knowledge to the in- 
formation system and thus produce a symbiosis. This is more fully described in the discussion 
on the requirement for furthering the effective use of data from space science experiments. 

It Must Provide for Simple Communication with the Scientist 

The first commercially available large-scale digital computer was installed at the 
Census Bureau in early 1951. It was delivered with essentially no software. As programmers 
gained experience, they developed subroutines, assemblers, compilers, generators, operating 
systems, etc. All this was rery nice for the professional programmers, but it did not materially 
assist the scientist in communicating with the computer. The scientist would like to describe 
his problem in a language and discipline that is part of his everyday activities. Specifically, 
this means that six conclusions may be reached about the software: 

1. The computer routines should be problem oriented as well as technique- oriented. 

2. The routines should be oriented toward the type of data structures familiar to the 
scientist rather than toward the internal structure of the computer. 

3. The routines should assume that the user is quite knowledgeable with respect to 
his research problem and relatively naive with respect to the software of the 
computer. 

4. In addition to error diagnostics, other man-machine interface aids should be in- 
corporated within the routines to bridge the gap between a very demanding com- 
puter and a naive user! 

5. The command language should be “forgiving,” allowing the user to restate any 
command in error. 

6. The system should operate by assisting its users in becoming increasingly proficient 
as they gain experience in the use of the system. The system should provide a 
dialogue with Hie user so that any reasonable questions can be addressed to the 
system. This would cause the system to respond with instructions to the requester 
and give him a set of choices from which to make a selection. 

it Must Provide the Scientist with Rapid Access to Any Element of the Data Base 

It has already been noted that large volumes of data are necessary to obtain patterns, 
interactions, and relations. Moreover, if the scientist is to observe a relationship from one 
phenomenon to another, it is clear that he may want to have access to any element of the 
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data base. It is equally clear that if the scientist is not to lose patience with the system, the 
response time should be compatible with human intellectual response time. To be sure, the 
system need not respond with a uniform time increment to all requests since there are eases 
where a short time response is not necessary. Consider, for example, the case of browsing. 

The user is not sure what he wants; the display of various pieces of the information system 
helps him formulate his requirements and make a selection. 

The cost of providing a rapid response is likely to be quite high. Therefore, it may 
be desirable to have the user specify his required response time. 

It Must Provide the Means to Further the Effective Use of Data from 
Space Science Experiments 

The research scientist is not necessarily a professional programmer, nor need he be. 
Therefore, the computer software sys<- hi should be “transparent” to him. The system 
should provide every 7 possible tool to enhance his research effort. This implies an on-line 
information system, a wide assortment of readily available and useful subroutines, display 
equipment, and graphic capability, etc. Every conceivable aid should be considered and 
added to the system if practical. 

Particularly desirable would be the use of Interactive Graphics (display oriented 
man-machine interaction). A great deal of work has been done recently in this connection. 
(Ref. 6.) Of direct interest is a paper which describes the “Analysis and Display of Physics 
Data” using interactive graphics. (Ref. 7.) This analysis is accomplished through a class of 
computer programs known as SUMX, whose principal function is to produce statistical sum- 
maries of experimental data. 

The original version of SUMX was written at the University of California, Lawrence 
Radiation Laboratory, Berkeley, California, using the IBM 2250 console for on-line commu- 
nications. The program capitalizes on the unique ability of a display device to present data 
in forms and at speeds that cannot be attempted on conventional output devices, and thus 
provides a useful interactive mode of computer use. 

It is not difficult to imagine the new dimension of service that could become avail- 
able to the scientific community by allowing it to interact with the information system 
through a display device. It is easy to learn about new devices if they are potentially bene- 
ficial to us. Project MAC was initiated several years ago at the MIT Computation Center 
with the broad goal of investigating new ways in which on-line use of computers can aid 
people in their individual intellectual work, whether it be research, engineering design, 
management, or education. One result of Project MAC is that an “essential part of the re- 
search effort is the evolutionary development of a large, multiple-access computer system that 
is easily and independently accessible to a large number of people, and truly responsive to 
their individual needs." As an ultimate goal of Project MAC, “one envisions an intimate 
collaboration between man and computer system in the form of a real-time dialogue where 
both parties contribute their best capabilities.” (Ref. 8.) 






It Must Allow Many Scientists to Communicate with the Information 
System at " the Same Time " 

It would be very nice if a computer and all its resources were made available 100% of 
the time to a user during his communication with the system. This indeed was done during 
the early days of high-speed electronic computers. However, for reasons which we do not 
need to go into, this was a very expensive procedure. This was especially true as computers 
became faster. To alleviate this situation, batch processing and operating systems were in- 
troduced. One negative effect was the introduction of a barrier between the user and the 
computer. As computers became still faster, it was observed that a small fraction of the 
central processor was required to solve a user’s problem. This led to multi-programming. 
When the 3rd generation computers became available with their flexible input-output capa- 
bility, time-sharing became a reality to the general public.* What this essentially means is 
that many users have access to the computer at apparently the same time. The response 
time obviously depends upon the central processor speed, the number of “simultaneous” 
users, and the type of computation required. If the environment is such that a scientist can 
perform his functions within a response time that is not exc _.ssively large, then time-sharing is 
the solution for many scientists since they may have access to the information system in near 
real time. This is indeed the situation at NSSDC. It is envisioned that scientists will not only 
require access to the data base at the same time, but also that they will want to communicate 
with each other through the computer system. Remote terminals may be required through- 
out the United States at various research centers and possibly at the universities for teaching 
and instruction. Professor Corbato has suggested that multiple-access computers have low- 
ered “the barrier of man-machine it. fraction by at least two orders of magnitude.” (Ref. 9.) 
It suffices to say that this is the direction for NSSDC to follow. 

It Must Provide Tools For Assisting the NSSDC Scientist to Modify 
and Expand the Data Base 

It has been estimated that within the next few years more than one trillion bits will 
be added to the active data base per year. It is obvious then that file structuring techniques 
and data storage techniques must be able to handle and cope with this massive data input to 
the information system. The speed and efficiency with which the data base can be modified 
and expanded often determines the cost or feasibility of the entire system. Charles Meadow 
stated this clearly when he said: 

“A system designer should never lose sight of the fact that well-maintained 
files are as much a user requirement as the ability to do high-speed search- 
ing, although he will rarely hear a user say so.” (Ref. 10.) 

Joseph Becker and Robert Hayes have noted that “an information system must exist 
as a structure before it can respond to needs.” They go on to say, “What is required, it would 
appear, is a recognition of system design as a problem distinct from operational and 

•Time-sharing was shown to be feasible by Project MAC 
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equipment interests as such, with the basic purpose of matching the two for the most effec- 
tive total system.” (Ref. 11.) 

It Must Maintain Normal Operational Capabilities During the Phaseover 

from One Level of Capability to the Next 

It is apparent that the large information system, as that required for NSSDC, will not 
be developed in one fell swoop. It is more likely that the system will be constructed in 
phases. Implicit during these periods are modifications to the program specifications and 
design. Changes in hardware and operational procedures are also likely. Therefore, any 
planning that is done should take these changes into consideration so that the NSSDC oper- 
ational functions continue to be performed in support of its users. The computer software 
should be modular in construction employing a widely used procedure-oriented language 
wherever possible. 

It Must Effectively Purge the Active Data Base 

The following points which should be considered when planning for the retirement 
of data have previously been listed (Ref. 2): 

1. Large volume plus cost of maintenance plus fixed resources dictate the orderly 

retirement of data. 

2. Early in its life, various forms of the data are useful, e.g., time ordered, space 

ordered, etc. 

3. Data can be reduced without losing information content: 

• By eliminating certain forms of data 

• By removing derived variables 

• By keeping only the significant number of bits, not the full computer words 

4. Information content of data can be reduced: 

• By breaking out event data from background 

• By averaging the background over suitable time intervals 

• By preserving only special data for historical purposes 

• By preserving only outstanding geophysical event data 

• By compressing data into analyzed forms so that general understanding of 
phenomena is retained 
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Because of the massive data supply to the information system, every available tool 
should be considered to condense the number of bits and still retain the basic information. 
Perhaps additional tools can be found in information theory. 

It Must Have Growth Potential 

The history of computing has shown that there are a great many instances where 
tasks outgrow the equipment. When this has happened there inevitably has been a large 
cost and much agony in redesigning, reanalyzing, reprogramming, and redebugging. There- 
fore, the need for planned expansion is vital. One can envision the need for more memory, 
more input-output equipment, more display capability, more remote terminals, more com- 
puting speed, etc. 

It Must Provide for Management and Other Support Information 

The following subsystems are presently operational and will form an integral part of 
the information system to be developed: 

1. Automated Internal Management (AIM) 

The AIM subsystem is the heart of the present management system. Detailed 
descriptions of the spacecraft, experiment(s), and data set(s) are entered along 
with acquisition activity for subsequent effective retrieval. Some of the tasks 
done by AIM are: projecting volume of incoming data sets; performing logical 
searches to answer queries; providing action reminders and other control functions; 
and producing various management and user-oriented reports. 

2. Request Accounting Status and History (RASH) 

RASH performs the bookkeeping on requests for data and services. It is de- 
signed to display up-to-date information relating to number of requests, their 
status, estimated and actual costs, processing time, and necessary action reminders. 
This information can be retrieved by data set, requester, affiliation, date of re- 
quest, date filled, request agent, status of request, etc. 

3. Technical Reference File (TRF) 

The TRF has been designed to provide the most extensive bibliographic informa- 
tion in space science. The access to the references is provided by accession number, 
title, author, and keywords (controlled and uncontrolled). Boolean logic searches 
are allowed. 

4. Machine-Oriented Data Subsystem (MODS) 

MODS is used for data set analysis, generation of data set catalogs, and the use 
of special techniques where the interchange of information is inhibited by the 
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diversity of hardware. For example, included in MODS will be a package called 
PIFT (Package for Information Formatting and Transformation) which will pro- 
duce densely packed machine-and media-independent data sets that may be accessed 
in the man-machine mode. 

In addition to the above subsystems the NSSDC is presently developing a 
system to collect, identify, store, and retrieve high-quality satellite photographs. 

The Extra-terrestrial Photographic Information Clearinghouse (EPIC) will have 
a data base consisting of support information and keywords which describe the 
photographic information. The main purpose of EPIC is to provide a powerful 
tool in serving requests for photographic data. 

Many of the functions presently performed by NSSDC could be done on-line 
interactively with a computer. The cycle from receiving a scientific article to 
analyzing it, identifying it, filling out forms, keypunching, performing computer 
runs, printing, correcting, etc., is quite large. Cost-effectiveness studies are neces- 
sary to determine and identify those functions which can be most efficiently per- 
formed with the latest equipment and techniques. 


CONCLUSIONS 

It is still too early to design the future hardware and software of the NSSDC. Yet, 
if the requirements that have been described are to be met, there are certain conclusions that 
can be made. 

Hardware Conclusions 

The computer associated with NSSDC must have a multi-programming capability 
with a large number of variable tasks. It must have a telecommunication capability. It should 
service many remote terminals that have graphic display capability. The CRT consoles 
should be used for input to the system as well as output. It is interesting to note that by the 
year 1970, annual production in the United States of CRT’s will be more than 100,000 units. 
(Ref. 12.) The computer should also have a multi-processing capability so that growth can 
be assured with a minimum disturbance. Finally, it is vital that very large direct-access de- 
vices be linked to the computer. An on-line device is available today with a 10 12 bit capac- 
ity. (Ref. 13.) This is about one year’s worth of scientific data that NSSDC can expect to 
receive. However, all is not lost, as Professor Feynman pointed out in a talk several years 
ago at the California Institute of Technology, “there’s room at the bottom,” thereby imply- 
ing memory possibilities at the molecular level!! (Ref. 14.) 

Software Conclusions 

In addition to providing the systems software in a modular, dynamic, and flexible 
way, there needs to be a great deal of emphasis placed upon: 
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1. The development of user oriented languages, depending upon the discipline that 
is being investigated and on the application of the user 

2. The utilization of mathematical and scientific subroutines to assist the user with 
his research 

3. The utilization of special plotting and graphic subroutines 

4. The development of special routines to help the scientist communicate efficiently 
and effectively with the massive data base. 

Study Areas 

It can be concluded from the requirements of the NSSDC information system that 
the design of the system will be enhanced by carefully studying the following areas: 

1 . How should the data base be structured? 

2. What types of search procedures should be employed? 

3. What data and in what form should the data be stored on-line? 

4. What devices should be used? 

5. What compression techniques should be used? 

6. How much automatic abstracting and document classification should be done? 

7. How much and in what ways shall we use character recognition equipment? 

8. What are the best languages for the scientist to use in communicating with 
the system? 

9. How much interaction should there be and how shall it be done? 

10. What are the requirements of the researcher in his communication with the 
computer? 

1 1. What computer hardware terminals and displays are best? 

12. How should the software be designed? 

13. What tasks should be done with special-purpose hardware? 

In short, cost-effectiveness studies should be made in a great many areas so that the 
best system can be designed with the least amount of money. Particular emphasis should be 
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placed upon a study to determine if NSSDC requires a large, special-purpose computer with a 
specially designed operating system and software. 


SUMMARY 

The following requirements for the NSSDC information system were arrived at after 
discussing the mission of the NSSDC: 

• It must be able to handle large varieties and amounts of data. 

• It must prepare these data for a multitude of different uses. 

• It must provide for simple communication with the scientist. 

• It must provide the scientist with rapid access to any element of the data base. 

• It must provide the means to further the effective use of data from space science 
experiments. 

• It must allow many scientists to communicate with the information system at 
“the same time.” 

• It must provide for tools in assisting the NSSDC scientists to modify and expand 
the data base. 

• It must maintain normal operational capabilities during the phaseover from one 
level of capability to the next. 

• It must effectively purge the active data base. 

• It must have growth potential. 

• It must provide for management and other support information. 

Based upon the validity of these requirements, conclusions were made about the 
hardware and software and also about various studies that should be made in order to en- 
hance the performance of the system. 
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